Model Selection

Reinforcement learning fine-tuning

# Reinforcement learning fine-tuning

Unireason Qwen3 14B RL GGUF

A static quantization version of UniReason-Qwen3-14B-RL, suitable for text generation and mathematical reasoning research scenarios.

Large Language Model

Transformers English

Vigorl 7b Spatial

ViGoRL is a vision-language model fine-tuned through reinforcement learning, used to clearly associate text reasoning steps with visual coordinates to achieve precise visual reasoning and positioning.

Deepseek R1 Distill Qwen 14B GRPO Taiwan Spirit

This is a fine-tuned version based on the Qwen-14B model, trained using the GRPO method, suitable for text generation tasks.

Large Language Model

Codev R1 Qwen 7B

CodeV-R1-Qwen-7B is a model obtained through reinforcement learning fine-tuning based on the CodeV-R1 framework and the Qwen/Qwen2.5-Coder-7B-Instruct. It focuses on Verilog-related tasks and can effectively solve the problem of automatically generating hardware description languages in electronic design automation.

Large Language Model

Xgen Small 9B Instruct R

xGen-small is an enterprise-grade compact language model that achieves long-context performance with predictable low costs through domain-focused data curation, scalable pre-training, length extension, and reinforcement learning fine-tuning.

Large Language Model

Transformers English

Phi 4 Reasoning Plus GGUF

Phi-4-reasoning-plus is a large language model developed by Microsoft with enhanced reasoning capabilities, specifically optimized for complex mathematical problems and multi-step reasoning tasks.

Large Language Model Supports Multiple Languages

lmstudio-community

Openhands Lm 7b V0.1 GGUF

OpenHands LM is an open-source coding model built on Qwen Coder 2.5 Instruct 32B, which performs excellently in software engineering tasks through special fine-tuning.

Large Language Model English

Ablation 141 A128.dpo.armorm.rp Shisa V2 Llama 3.1 8b

Language model fine-tuned using DPO method, suitable for text generation tasks

Large Language Model

Ice0.101 20.03 RP GRPO 1

A Mist model optimized with Unsloth lazy-free framework and Huggingface TRL training library, achieving 2x training efficiency

Large Language Model

Transformers English

Llama 3.1 Tulu 3.1 8B

Tülu 3 is a leading family of instruction-following models, offering fully open-source data, code, and training methodologies as a comprehensive guide to modern technology. Version 3.1 features improvements in the reinforcement learning phase, delivering enhanced overall performance.

Large Language Model

Transformers English

A fine-tuned version based on the EleutherAI_pythia-1b-deduped model for generating concise summaries

Large Language Model

Llama 3 NeuralPaca 8b

An optimized model based on Meta LLAMA-3-8B, trained using lazy-free optimization techniques and the Huggingface TRL library, achieving 2x speed improvement

Large Language Model

Transformers English

Blip Image Captioning Base Mocha

Official checkpoint of BLIP base model fine-tuned on MS-COCO dataset using MOCHA reinforcement learning framework to mitigate open-vocabulary description hallucination

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase